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Cn I Simple coarse-grained models, such as the Gaussian Network Model, have been shown to capture 

;_( . some of the features of equilibrium protein dynamics. We extend this model by using atomic contacts 

CIh' to define residue interactions and introducing more than one interaction parameter between residues. 

•^r I We use B-factors from 98 ultra-high resolution X-ray crystal structures to optimize the interaction 

parameters. The average correlation between GNM fiuctuation predictions and the B-factors is 0.64 

for the data set, consistent with a previous large-scale study. By separating residue interactions 

into covalent and noncovalent, we achieve an average correlation of 0.74, and addition of ligands 

and cofactors further improves the correlation to 0.75. However, further separating the noncovalent 

interactions into nonpolar, polar, and mixed yields no significant improvement. The addition of 

QQ ' simple chemical information results in better prediction quality without increasing the size of the 

• . coarse-grained model. 

o: 
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Introduction 

Proteins reliably self-organize into specifie shapes that are essential for their function. The coordinates that are 
reported as protein structures, however, are the average positions of an ensemble of fluctuating conformers that 
constitute the native state. It is becoming increasingly accepted that protein structures define specific types of 
motions that play important roles in protein function. The mechanism is rarely clear, however, owing in part to the 
difficulty of direct observation of protein motions. Crystals can be subjected to time-resolved experiments 1], but the 
range of applications is limited to reactions that can be triggered by light or trapped by clever manipulations. NMR 
spectroscopy is can be used to determine both the structure and the dynamics of proteins |3| , but it is limited both by 
the maximum size of protein structures and by the difficulty of discrimination of slowly or quickly exchanging dynamics 
[2J. Mass spectrometry coupled with hydrogen/deuterium exchange and proteolysis has been used to determine 
changes in the relative solvent accessibility of amide hydrogens |j|, and single-molecule experiments using optical 
trapping have resulted in spectacular observations of the motion of motor proteins l5|. Overall, direct measurement 
of molecular motion remains laborious and limited. 

Computational methods have been utilized for several decades to study the motion of proteins [6j, but the com- 
putational cost of all-atom force-fields remains too expensive for studying many interesting large-scale systems. One 
strategy for modeling the dynamics of folded proteins is to simplify the complicated all-atom potentials to a quadratic 
function in the vicinity of native state. The quadratic form allows for decomposition of the motions into vibrational 
modes with different frequencies, known as normal modes, and this approach has been widely used in computational 
studies of macromolecules since its introduction over two decades ago 0, la, Q • One of its advantages has been in 
determining the concerted motions that involve large parts of the protein, which correspond to the lowest-frequency 
modes. These "global" modes have been used to predict protein flexibility [lOJ and to study the mechanism of protein 
function where protein motion plays a key role [lj|. 

Coarse-grained models, which are based on a simplifled representation of protein structure, have been used histor- 
ically to study the physics of folding and conformation changes in biomolecules 12j. They remain attractive today, 
despite the exponential growth in computing power, because both the size of molecular structures being determined 
and the volume of structural data has increased at a similar rate. A class of simple coarse-grained models known as 
Elastic Network Models, which are based on Hookean spring interactions, has been in use for a decade [13, with some 
success at capturing features of protein dynamics 14]- These models define spring-like interactions between residues 
closer than a certain cutoff distance, which gives good agreement with overall flexibility profiles for protein structures. 

X-ray crystallography has been responsible for determination of the vast majority of protein structures to date. 
Conformational changes can also be observed, for instance from multiple structures of a structure under different 
conditions, or as multiple conformations within a single crystal seen in high- resolution structures. Crystallography also 
provides a measure of mobility through refinement of Debye- Waller temperature factors, or B-factors, for individual 
atoms. This parameter is a measure of uncertainty in atomic position, and incorporates model error, lattice defects, 
and other experimental sources of noise in atomic position, in addition to positional variance due to internal protein 
motion. The noise contributions to the B-factor are large in low resolution structures, but are far less prominent in 
well-refined high-resolution crystal data. Numerous studies have found good agreement between the B-factors and 
other experimental dynamic measures, as well as with computational predictions from molecular dynamics simulations. 

The study of protein conformational dynamics benefits from an interplay between experimental data and computa- 
tional modeling. A number of studies have compared the predictions of directionality and magnitude of motion from 
normal mode analysis with observed conformational changes, but typically the studies have focused on individual 
structures. Only recently have the computational capabilities advanced to easily process large data sets and sufficient 
experimental data has been amassed to perform large, systematic validations of computational models of protein dy- 
namics. Gerstein and co-workers |l3 have compared the predictions of directionality of motion with 377 structures of 
proteins in two conformations. Teasdale and co-workers ildl predicted B-factors from sequence information over a set 
of 766 protein chains. Halle computed residue fiexibility from packing density considerations for a set of 38 structures, 
and compared them with B-factors [T3I ■ Zhou and co-workers used an all-atom model developed for studying folding 
pathways to predict the fiexibility of 18 structures [l3|- The predictions of a residue-level elastic network model called 
the Gaussian Network Model (GNM) [I^ were systematically tested on a set of B-factors from 113 crystal structures 
|20| , and found that GNM performed substantially better than a rigid-body model of protein motion. 

This paper presents a systematic extension of GNM by incorporating chemical information into the coarse-grained 
model. We optimized and validated this model, called the Chemical Network Model, using a data set of B-factors 
from 98 of the highest resolution crystal structures in the Protein Data Bank. We test the effect of stepwise addition 
of several chemical parameters, and increase its complexity until no further gains in predictive power are obtained. 



Theory and Methods 

The Gaussian Network Model (GNM) has been described in detail elsewhere [l^; briefly, it defines a potential 
based on distance between Ca atoms. Residue pairs within a cutoff distance R^ are connected by Hookean spring 
potentials (Figure ^ . The resulting Hessian, also known as the Kirchoff matrix, contains diagonal elements equal to 
the number of contacts for residue i, while the off-diagonal elements are -1 if there is a contact between residues i and 
j. If Rij is the distance between Gas of residues i and j, then the Hessian matrix elements are defined as follows: 

H.,^h ^^^-^^^ (1) 

' 1 if R^, >Rc ^ ' 
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We modify and extend the Gaussian Network Model in two ways. First, we define residue contact based on the 
closest distance between nonhydrogen atoms of the two residues, instead of only considering Ca atoms. Thus, we use 
the positions of all atoms to determine the interaction potential at the residue level. Second, we introduce different 
classes of residue interactions, with distinct Hookean spring constants. If Ha is the Kirchoff (contact) matrix for class 
a, the total Hessian matrix for the harmonic model is a linear combination of the matrices, with ka as the interaction 
constant for each class, for example: 

The constants are determined by fitting predicted fluctuations against a data set of crystallographic B-factors, as 
described below. The total Hessian is then diagonalized to find the normal modes, or eigenvectors Ui and the 
corresponding frequencies uji: Hui = ojfui. The decomposition allows us to compute both self- and cross-correlation 
of motion between residues from the covariance matrix, which is proportional to the pseudo-inverse of the Hessian 
|2l|. Specifically, we are interested in the positional variances, or the mean square fluctuations of residues, which are 
determined as follows (Ax.; is the deviation of position of residue i from the mean and Uij is the j-th element of the 
i-th normal mode): 
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Note that the modes with the lowest frequencies make the greatest contribution to residue mobility, so a small fraction 
of all the modes is sufficient to obtain a good approximation of the sum. 

We used perl programs to parse PDB files and determine residue contact matrices based on atomic coordinates. 
To determine the optimal cutoff parameters, a range of Ga cutoff distances was used, from 6 to 12 A, similarly, 
nearest-atom cutoff distances were varied from 3.5 to 9 A. Gopies of the protein molecule surrounding the structure 
in the crystal were generated using the symexp command in PyMOL ^]. In both GNM and GNM, the crystal 
environment was taken into account by adding interactions between residues involved in crystal contacts, without 
explicitly adding crystal copies to the matrix. Since the model does not contain directional information, this is a more 
precise approximation of the effect of the crystal lattice than explicit inclusion of the first layer of crystal neighbors. 
For the nearest-atom method, the interaction of atoms in more than position were counted proportional to their 
occupancy. The matrices were diagonalized using the MATLAB computing environment |23j| . The predicted mean 
square fluctuations (MSF) were computed as a sum over all the normal modes as shown in equation |21 and then 
compared to a set of experimental B-factors. 

The data set was obtained by searching the Protein Data Bank for protein structures determined by X-ray crystal- 
lography to at least 1.0 A resolution, containing at least 50 residues in a single chain. Structures with more than 50 
% identity were discarded, leaving 98 non-redundant proteins. These are structurally diverse, representing all major 
SGOP families, as shown in the comprehensive table in Supplemental Materials. Isotropic Ga B-factors from each 
structure were normalized to mean 1, to enable simultaneous fltting over multiple structures. The B-factors from 
atoms with more than one conformcr or occupancy less than 1 were not used for fltting or validation, due to the link- 
age between occupancy and the B-factor. The usable data set consists of 20942 B-factors. In addition, non-protein 
residues were considered for a subset of structures with ligands or cofactors other than those from crystallization 
buffers or precipitants. For the 68 structures with ligands or cofactors, separate calculations were performed with 
and without including the non-protein molecules in the model. Each molecule, whether large or a single metal ion, 
was considered as a single residue and included in the Hessian, but only the B-factors from protein residues were 
compared with the predictions. 



We determine interaction constants that maximize the correlation between computed fluctuations and crystallo- 
graphic B-factors. Since there is no analytic expression for the fluctuations as a function of the spring constants, 
standard gradient-based optimization techniques are not applicable, and we use parameter-space search methods. 
The first model consists of two classes of residue interactions: bonded and nonbonded. Because we test for correla- 
tion, scaling is immaterial, so the bonded parameter was set to 1, and only the nonbonded interaction constant was 
varied. We use a simple search over a range of values from 0.01 to 1 for the nonbonded constant. 

We expand the model to include further chemical categories, specifically polar interactions, nonpolar interactions, 
and those that do not fall in either category. These were defined by the types of the nearest atoms for a residue 
pair. Nitrogens or oxygens less than 3.3 A apart were classified as a polar contact. The cutoff distance for the other 
two categories were varied from 3.5 A to 9 A: the nonpolar category, which is defined as two carbon atoms, with 
the exception of backbone carbons and certain charged carbons, such as those in carboxyl groups, and the mixed 
category, which included any other atom pairs. To find the maximum correlation by varying the three parameters we 
utilized a standard parameter space method, called the simplex algorithm ,24 |. It involves evolving a polygonal region 
(simplex) in parameter space in an effort to enclose the optimal point. The algorithm was implemented in MATLAB 
and applied to three training sets of 15 structures, while the remaining 53 structures served as a test set for unbiased 
assessment of the optimized parameters. 

Results 

The Gaussian Network Model represents all residue interactions within a cutoff distance between Ca atoms as 
identical harmonic potentials. We introduce two modifications to the model to better represent the chemistry of 
residue interactions. First, interaction types are separated into classes with different strengths, or spring constants, 
to model the diversity of residue interactions in protein structures. Second, inter-residue contacts are defined by the 
closest distance between atom pairs, rather than the distance between Ca atoms. Figure |3 demonstrates how a Ca 
distance cutoff of 7.3 A can miss a strong ring-stacking interaction, but may include a weak contact instead. While 
all atoms are considered in determining residue interactions, the size of the Hessian matrix produced by the model is 
equal to the number of residues in the structure, as in GNM. 

The results demonstrate that a combination of the two modifications results in significantly larger improvement 
than either one alone. Tables HI and Ull show the results for Ca distance cutoff and the nearest-atom distance cutoff, 
respectively. Average correlations over the entire data set were computed for a range of cutoff distances and a number 
of nonbonded interaction constants. Ca distance method benefits from separation of interaction types, especially for 
the larger cutoff distances, in which large numbers of contacts are included. The improvement is greatest for the 
combination of nearest-atom cutoff of 4 A and the nonbonded parameter of 0.1, giving an average correlation of 0.743. 
This is significantly higher than the best GNM prediction of 0.643, at Ca cutoff of 7.5 A. The improvement is seen 
in almost every structure, listed in Supplemental Materials. Thus, the combination of the two modifications, termed 
the Chemical Network Model (CNM), improves the prediction power by 10%. The GNM results are consistent with 
an earlier large-scale study ^], which found an average correlation of 0.66 at cutoff of 7.3 A. As in that work, crystal 
contacts were included in the models, as described in Methods, and resulted in significantly improved agreement (data 
not shown). 

In both GNM and CNM results, there is considerable variation in fluctuation prediction over different structures. 
One hypothesis is that the elastic network models are best suited for dense, globular structures, and are less accurate 
for sparsely packed residues on protein surface z5] . Tabic IIIII presents a breakdown of results for structures with 
different fraction of surface residues, defined as those with less than 3 nonbonded contacts in CNM. We see that 
structures with the lowest and highest fraction of surface residues show significantly lower average correlations in 
GNM and CNM. Contrary to expectation, the structures with the lowest fraction of surface residues have the worst 
predictions in both models, but also show the greatest improvement from GNM to CNM (from 0.49 to 0.64). The 
average correlation is also significantly lower in the set with the highest surface fraction, and the standard deviation of 
prediction quality is also higher in the high and low surface fraction sets. Figure 13 illustrates the variability of model 
agreement with plots of normalized fluctuation proflles and the B-factors for the two structures with the best and 
the worst correlation with CNM predictions. PDB structure IJOP, with CNM correlation of 0.46, is a small bacterial 
cytochrome C3 with 4 embedded hemes, and due to this has the inordinately high fraction of surface residues of 
0.31. On the other hand, the best prediction is seen in PDB ID 2BW4, with CNM correlation at 0.9 This is a 
nitrite reductase that has a well-packed globular fold, with the exception of a long C-terminal tail that is packed by 
crystal contacts, with overall surface fraction of 0.10. In addition, we observe a positive effect of larger protein size 
on prediction quality for both methods, as shown in table IIVI 

We extend the classification of residue interactions by separating the nonbonded category into polar, nonpolar, and 
mixed. Separate Kirchoff matrices were computed for each category, and optimal interaction constants for each were 



found by the simplex method, as described in Methods. 3 training sets of 15 structures were used for optimization, 
and fluctuation predictions were computing using the optimal parameter sets, and compared on a separate test set of 
53 residues; the results are shown in Table IVl Although some improvement can be seen from optimization on the test 
sets, and larger improvement can be seen in individual structures (data not shown), the correlation over the reference 
set using the optimized parameters is lower than that with all the nonbonded parameters set to 0.1. The increase 
in correlation in individual sets is only a result of fitting imperfections of the coarse-grained model for particular 
structures, not evidence of general differences in interaction strength. Thus, there is not a sufficient distinction in the 
different types of interactions to warrant including them as separate categories. 

Several other factors were considered in order to further improve the prediction quality. The presence of cofactors 
or ligands in the crystal structures can affect the mobility of the neighboring residues. The results presented above 
omitted the non-protein residues, and when the subset of structures with ligands or cofactors is compared to those 
without, the group without non-protein residues has a slightly higher average correlation. Addition of cofactors and 
ligands, as described in Methods, improves the average correlation for the ligand-containing group from 0.740 to 0.748, 
similar to the value of 0.749 for the ligand-free group. Thus, the consideration of non-protein residues results in a 
small but measurable improvement in mobility prediction. It also behooves us briefly to report the modiflcations of 
the model that either yielded no improvement or were detrimental to the prediction quality. They include making 
the interaction between residues proportional to the number of atom pairs within interaction range, adding mass- 
weighting to the Hessian matrix, and introducing a new interaction category for residues within the same secondary 
structure element. 

The lowest-frequency normal modes and their eigenvalues from GNM and CNM were compared. Figure 01 shows 
the mean dot product between corresponding normal modes and the ratio of the eigenvalues, normalized to the lowest 
value. We see that the lowest-frequency modes are quite similar, but progressively diverge at higher frequencies, 
with little similarity remaining by normal mode 10. This demonstrates that the two methods share an overall gross 
structure, which is reflected in the lowest-frequency modes, but the details of contact selection and interaction strengths 
play a greater role at higher frequencies. Still, the differences are not negligible, and the improved predictive power 
of CNM suggests that its normal modes are more accurate, as well. 

Discussion 

Simple models of complex systems serve at least two purposes. Practically, they offer efficient computation, enabling 
approximate treatment of objects that are beyond the current computational capabilities of more realistic methods. 
For instance, the dynamics of large macromolecular assemblies are still prohibitively expensive to be treated by all- 
atom molecular dynamics. Coarse- grai ned potentials provide an opportunity to quantitatively study systems such as 
viral capsids |2g and the ribosome '2?!, which play critical biological roles. The second advantage of simple modeling 
is that it sharpens our understanding. Beginning with the most basic assumptions and gradually adding details, one 
can arrive at a minimal set of key variables that describe an opaque reality. This was the approach taken by this 
study. 

The Gaussian Network Model has been successful at predicting the features of collective protein motions, as ev- 
idenced by comparison of fluctuation profiles with crystal B-factors and NMR relaxation data j23|, as well as by 
prediction of conformational changes from low- frequency normal modes |29|. A previous large-scale study 20] has 
systematically assessed its agreement with crystal B-factors, finding an average correlation of 0.66, while a rigid-body 
model obtained a correlation of 0.52. This provided clear evidence that the contact topology of protein structures 
plays a key role in determining the near-native dynamics. However, there is room for improvement of the correlation 
coefficient, and this motivated our chemistry-based coarse-grained model of protein dynamics. 

The Chemical Network Model rests on the assumption that atomic contacts are the primary means of inter-residue 
interaction. We construct the Hessian matrix at the residue level from atomic information present in crystal structures. 
Further, we divide the interactions into classes, first into bonded and nonbonded, and then split the nonbonded 
category. Simplified residue-level forcefields which distinguish different interaction types have been proposed before, 
ranging from Go- like models for studying folding pathways jSfl IMl to amino-acid specific potential of Miyazawa and 
Jernigan |32l |. In contrast, our model applies to vibrational fluctuations in the native state, and is distinguished from 
these models by its simplicity and the systematic comparison against a large data set of reliable measurements of 
protein mobility. Similar modiflcations of elastic network models were reported very recently: one that strengthened 
the bonded interactions in GNM to match the predictions of all-atom normal mode analysis 33], and another |3J] 
which divided interactions into several types ranging from disulfide bonds to van der Waals contacts to construct an 
extension of the anisotropic version of GNM, known as ANM [3^. However, the first study uses a Ca-cutoff potential, 
and we demonstrated that the combination of nearest-atom contact potential and different interaction strengths leads 
to further improvement. The second study did not justify the values of parameters chosen for the different interaction 



types. Finally, both use only a few examples rather than a large data set to validate their models. 

Our results show that the nearest-atom contact potential coupled with differentiation of bonded and nonbonded 
interactions leads to a synergistic improvement of mobility prediction. The nearest-atom contact potential adds some 
contacts missed by GNM, yet excludes other GNM interactions. On average, there are fewer residue contacts in CNM 
with the nearest-atom cutoff of 4 A than in GNM with the optimal cutoff of 7.5 A. The improvement of contact 
selection is apparently counterbalanced by a reduction in contact density, which may be why nearest-atom contact 
potential alone has no significant effect on prediction quality. The introduction of bonded and nonbonded constants 
modifies the relative density of contacts to better match the observed residue mobility. We also observe that both 
GNM and CNM work best for typical globular structures, and those with very high or very low fraction of surface 
residues show substantially lower prediction quality. This may also explain why larger proteins tend to show better 
prediction, since the surface fraction is more stable, and illustrates the suitability of coarse-grained modeling for large 
macromolecular assemblies. 

Classifying the nonbonded interactions into polar, nonpolar, and mixed, did not yield improvement in an unbiased 
comparison with a reference set of 53 structures. The correlation coefficient is relatively insensitive to changes in the 
interaction parameters: an order of magnitude change between bonded and nonbonded parameters was required to 
achieve a 10% improvement in average correlation, and smaller tune-ups of the nonbonded parameters have no signif- 
icant effect. Although optimization produces substantial improvement in individual structures (data not presented), 
these optimizations are apparently not applicable across a wide array of structures. 

The failure of the more complex model illustrates both the strengths and the limitations of the coarse-grained 
elastic network model. Addition of simple chemical information, together with consideration of crystal contacts and 
co-crystallized ligands and cofactors produces the average correlation of 75% with experimental data, with even better 
agreement for larger structures. This is solid quantitative predictive power for a model at the residue level, and better 
agreement probably requires detailed atomic modeling. The coarseness of the model also leads to its limitation: 
addition of more information is washed out due to the scale. This suggests that this class of models is unsuitable for 
addressing some important questions, such as the effect of mutations on protein motion j^, which sometimes have 
a direct functional link "37]. 

Prediction of observed fluctuations is only a means of validating the model, not a goal in itself. While computation 
of average positional deviation is sometimes useful, the most promising applications of harmonic models have been 
the use of low-frequency modes to study persistent collective motions in protein structures. This information has been 
used for prediction of mechanisms of functionally significant motions 10, 38, 39] or in quantifying allosteric interaction 
between distant parts of a protein structure [i^. Normal modes have enable the improvement of crystallographic 
structure determination by molecular refinement [4l| , the refinement of low-resolution structures of large assemblies 
|42l l43l | . Coarse-grained normal modes are also useful in analyzing the large numbers of newly determined structures, 
for instance in the prediction of active sites J4J| , automated decomposition of protein structures into domains [45| , and 
a determination of networks of residues involved in key conformational changes [4^ . While CNM and GNM predict 
similar lowest-frequency modes, the improvement in fluctuation prediction suggests that the changes in the modes 
are significant, and may provide more accurate prediction of collective motions, especially for large protein structures 
and assemblies. 

Conclusion 

We have extended GNM by constructing the Hessian contact matrix based on atomic contacts, and separating 
residue interactions into bonded and nonbonded. The resulting Chemical Network Model shows considerable improve- 
ment of the prediction of crystallographic B-factors, giving 75% average correlation on a data set of 98 ultra-high 
resolution structures. However, further separation of nonbonded interactions into polar, nonpolar, and mixed, did not 
yield any improvement in correlation coefficient. We have improved the residue-level elastic network model without 
increasing the computational cost, and found an appropriate level of complexity for the application. 
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I. TABLES 

TABLE I: Average correlation of B-factor prediction for Ca distance cutoff models. The cutoff distance is varied across the 
columns, and the nonbonded interaction parameter varies by row; the highest correlation for each cutoff value is in bold 



non 


6A 


6.5 A 


7A 


7.5 A 


8A 


9A 


10 A 


11 A 


12 A 


1.0 


0.542 


0.578 


0.624 


0.643 


0.629 


0.619 


0.627 


0.634 


0.628 


0.5 


0.552 


0.604 


0.639 


0.655 


0.643 


0.630 


0.636 


0.641 


0.633 


0.25 


0.548 


0.615 


0.646 


0.662 


0.655 


0.645 


0.649 


0.652 


0.643 


0.15 


0.540 


0.610 


0.643 


0.661 


0.659 


0.656 


0.660 


0.662 


0.652 


0.1 


0.525 


0.597 


0.634 


0.654 


0.658 


0.661 


0.668 


0.670 


0.660 


0.05 


0.490 


0.562 


0.605 


0.631 


0.643 


0.662 


0.676 


0.682 


0.672 


0.01 


0.395 


0.451 


0.500 


0.533 


0.558 


0.608 


0.646 


0.668 


0.670 
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TABLE II: Average correlation of B-factor prediction for nearest-atom distance cutoff models. The cutoff distance is varied 
across the columns, and the nonbonded interaction parameter varies by row; the highest correlation for each cutoff value is in 
bold. 



non 


3.5 A 


4A 


4.5 A 


5A 


6A 


7A 


8A 


9A 


1.0 


0.569 


0.649 


0.644 


0.632 


0.630 


0.625 


0.639 


0.633 


0.5 


0.612 


0.685 


0.676 


0.662 


0.652 


0.637 


0.648 


0.640 


0.25 


0.642 


0.717 


0.707 


0.692 


0.677 


0.656 


0.661 


0.651 


0.15 


0.649 


0.735 


0.726 


0.713 


0.696 


0.673 


0.674 


0.662 


0.1 


0.642 


0.743 


0.737 


0.725 


0.709 


0.688 


0.686 


0.672 


0.05 


0.611 


0.735 


0.738 


0.731 


0.721 


0.709 


0.706 


0.691 


0.01 


0.497 


0.625 


0.650 


0.654 


0.669 


0.692 


0.704 


0.697 
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TABLE III: Fraction of surface residues and accuracy of prediction 



Surface: 


Low 


Medium 


High 


Total 


structures 


10 


78 


10 


98 


residues 


1093 


18120 


1730 


20942 


surface fraction 


0.049 


0.103 


0.184 


0.107 


GNM " 


0.495±0.107 


0.657±0.095 


0.592±0.099 


0.643±0.105 


CNM " 


0.648±0.111 


0.752±0.082 


0.709±0.099 


0.743±0.089 



"average and standard deviation of correlation over the set 
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TABLE IV: Effect of protein size on average correlation with GNM and CNM fluctuations 



>size 


500 


300 


200 


100 all 


structures 


4 


20 


48 


73 98 


residues 


2828 


8804 


15731 


19277 20942 


GNM 


0.641 


0.660 


0.664 


0.651 0.643 


CNM 


0.774 


0.765 


0.760 


0.746 0.743 
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TABLE V: Optimization of nonbonded interaction parameters over 3 training sets of 15 structures and cross-validation on a 
reference set of 53 structures. 



training set 


Set 1 


Set 2 Set 3 


residues 


2249 


3805 3229 


polar" 


0.115 


0.147 0.129 


nonpolar" 


0.106 


0.107 0.049 


mixed" 


0.123 


0.072 0.045 


training before '' 


0.701 


0.740 0.705 


training after 


0.702 


0.752 0.726 


reference before *" 


0.761 


0.761 0.761 


reference after' 


0.757 


0.754 0.740 



"optimal parameter values for the training set as found by the simplex method 

'average correlations with all parameters at 0.1 (before) and with optimized parameters (after) 
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II. FIGURE CAPTIONS 

Figure n Cartoon of calmodulin structure (lEXR) in green with Ca atoms within 7.3 A connected by magenta 
dotted lines to represent GNM interactions. 

Figure [3 Contrast between residue interactions selected by Ca distance (magenta) and nearest-atom distance 
(blue). A: residues with a strong ring-stacking interaction with Ga distance greater than 7.3 A. B: residues not in 
chemical contact with Ca distance less than 7 A. Both examples from sperm whale myoglobin structure (1A6M). 

FigureEl Examples of computed fluctuation profiles and experimental B-factors (normalized) A: Worst prediction, 
IJOP (0.46 CNM, 0.46 GNM) B: Best prediction, 2BW4 (0.9 CNM, GNM 0.84). 

Figure^l Comparison of corresponding low-frequency modes from GNM and CNM. The blue curve shows the ratio 
(lower to higher) of the frequencies normalized to the lowest frequency, averaged over the 98 structures. The red curve 
is the average dot product between the corresponding normal modes. Note the fast decline of the normal modes at 
higher frequencies. 
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I. TABLE OF ALL STRUCTURES 



PDB ID SCOP family all residues usable residues surface fraction GNM (7.5 A) CNM (4.0 A) 



la6m 


a.1.1.2 


151 


151 


0.079 


0.482 


0.643 


laho 


g.3.7.1 


64 


62 


0.145 


0.719 


0.749 


Ibrf 


g.41.5.1 


53 


53 


0.151 


0.472 


0.604 


Ibyi 


c.37.1.10 


224 


210 


0.105 


0.611 


0.724 


lc75 


a.3.1.1 


71 


69 


0.072 


0.566 


0.649 


lc7k 


d.92.1.1 


132 


132 


0.068 


0.508 


0.73 


Icex 


c.69.1.30 


197 


197 


0.102 


0.75 


0.766 


ldy5 


d.5.1.1 


248 


248 


0.105 


0.673 


0.639 


lea7 


c.41.1.1 


310 


305 


0.089 


0.662 


0.683 


leb6 


d.92.1.12 


177 


177 


0.051 


0.572 


0.799 


lexr 


a.39.1.5 


146 


109 


0.165 


0.717 


0.68 


lf94 


g.7.1.1 


63 


63 


0.079 


0.486 


0.619 


lf9y 


d.58.30.1 


158 


156 


0.154 


0.441 


0.556 


lg4i 


a.133.1.2 


123 


112 


0.107 


0.693 


0.769 


lg66 


c.69.1.30 


207 


205 


0.088 


0.714 


0.797 


lg6x 


g.8.1.1 


58 


55 


0.164 


0.777 


0.886 


lga6 


c.41.1.2 


371 


369 


0.133 


0.738 


0.86 


Igci 


c.41.1.1 


269 


260 


0.085 


0.724 


0.792 


Igkm 


a.127.1.2 


509 


508 


0.1 


0.495 


0.674 


Igqv 


d.5.1.1 


135 


135 


0.104 


0.599 


0.703 


Igvk 


b.47.1.2 


243 


222 


0.104 


0.79 


0.893 


Igwe 


e.5.1.1 


498 


491 


0.075 


0.657 


0.757 


Ihjg 


b.47.1.2 


223 


215 


0.163 


0.503 


0.751 


lilw 


c.1.8.3 


302 


299 


0.057 


0.479 


0.663 


lic6 


c.41.1.1 


279 


279 


0.097 


0.709 


0.716 


liqz 


d. 58. 1.4 


81 


81 


0.148 


0.648 


0.809 


liua 


g.35.1.1 


83 


81 


0.074 


0.406 


0.63 


lix9 


a.2.11.1/d.44.1.1 


410 


381 


0.097 


0.585 


0.72 


lixh 


c.94.1.1 


321 


321 


0.112 


0.68 


0.74 


IJOp 


a.138.1.1 


108 


108 


0.306 


0.48 


0.464 


Ijfb 


a.104.1.1 


399 


375 


0.12 


0.632 


0.681 


lk4i 


d.115.1.2 


216 


216 


0.102 


0.574 


0.603 


lk5c 


b.80.1.3 


333 


330 


0.082 


0.693 


0.78 


Ikth 


g.8.1.1 


58 


58 


0.172 


0.658 


0.899 


Ikwf 


a.102.1.2 


363 


344 


0.081 


0.639 


0.672 


1191 


a.64.1.1 


74 


74 


0.041 


0.324 


0.518 


llkk 


d.93.1.1 


109 


109 


0.101 


0.535 


0.64 


llni 


d.1.1.2 


192 


165 


0.164 


0.568 


0.659 


Hug 


b.74.1.1 


259 


258 


0.097 


0.644 


0.7 


Imlq 


a.138.1.3 


90 


74 


0.311 


0.276 


0.56 


lm40 


e.3.1.1 


263 


118 


0.085 


0.713 


0.824 
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Imc2 


a.133.1.2 


122 


122 


0.066 


0.58 


0.514 


lmj5 


c.69.1.8 


297 


266 


0.079 


0.668 


0.757 


ImnS 


a.61.1.6 


379 


379 


0.09 


0.652 


0.806 


Imuw 


c.1.15.3 


386 


341 


0.073 


0.701 


0.724 


Imwq 


d.58.4.7 


194 


194 


0.103 


0.414 


0.501 


ln4w 


c.3.1.2/d.l6.1.1 


498 


452 


0.104 


0.69 


0.816 


ln55 


c. 1.1.1 


249 


244 


0.07 


0.805 


0.814 


Inki 


d.32.1.2 


268 


262 


0.141 


0.712 


0.744 


Inls 


b.29.1.1 


237 


236 


0.136 


0.472 


0.86 


Inqj 


b.23.2.1 


210 


210 


0.114 


0.766 


0.744 


Inwz 


d. 110. 3.1 


125 


109 


0.064 


0.401 


0.507 


lo7j 


c.88.1.1 


1300 


1296 


0.097 


0.596 


0.749 


loai 


a.5.2.3 


68 


68 


0.015 


0.628 


0.808 


lod3 


b.18.1.10 


131 


131 


0.099 


0.547 


0.774 


loew 


b.50.1.2 


329 


324 


0.136 


0.655 


0.754 


lokO 


b.5.1.1 


74 


65 


0.062 


0.732 


0.753 


Iplx 


c.1.10.1 


501 


501 


0.096 


0.771 


0.87 


Ipjx 


b.68.6.1 


314 


307 


0.147 


0.789 


0.884 


lpq7 


b.47.1.2 


224 


214 


0.187 


0.609 


0.795 


lq6z 


e.23.1.1 


524 


523 


0.105 


0.769 


0.843 


lr2m 


b.138.1.1 


140 


135 


0.156 


0.704 


0.843 


lr6j 


b.36.1.1 


82 


70 


0.043 


0.363 


0.484 


lrb9 


g.41.5.1 


52 


48 


0.146 


0.304 


0.888 


Irtq 


c.56.5.4 


291 


291 


0.076 


0.573 


0.684 


Isfd 


b.6.1.1 


210 


210 


0.148 


0.674 


0.626 


Issx 


b.47.1.1 


198 


174 


0.115 


0.734 


0.793 


ItgO 


b.34.2.1 


66 


56 


0.161 


0.792 


0.837 


Itqg 


a.24.10.3 


105 


71 


0.014 


0.405 


0.575 


itts 


d.190.1.1 


164 


164 


0.079 


0.657 


0.678 


lu2h 


b.1.1.4 


96 


96 


0.115 


0.53 


0.752 


lues 


b.85.1.1 


64 


64 


0.031 


0.648 


0.67 


lufy 


d.79.1.2 


121 


118 


0.093 


0.769 


0.784 


lug6 


c.1.8.4 


426 


426 


0.103 


0.738 


0.833 


lunq 


b.55.1.1 


117 


117 


0.111 


0.653 


0.841 


lusO 


c.1.7.1 


313 


259 


0.12 


0.676 


0.709 


IvOl 


c.1.8.3 


302 


292 


0.086 


0.718 


0.813 


lv6p 


g.7.1.1 


124 


108 


0.102 


0.528 


0.691 


Ivbw 


d.40.1.1 


68 


62 


0.081 


0.698 


0.766 


Ivyr 


c.1.4.1 


363 


361 


0.119 


0.642 


0.79 


Ivyy 


b. 115. 1.1 


113 


99 


0.121 


0.648 


0.714 


IwOn 


b.18.1.10 


120 


116 


0.121 


0.495 


0.6 


lx6z 


d.24.1.1 


119 


119 


0.092 


0.787 


0.693 


lx8q 


b.60.1.1 


184 


162 


0.111 


0.597 


0.676 


IxgO 


d. 184.1.1 


494 


470 


0.17 


0.597 


0.666 


Ixmk 


a.4.5.19 


79 


75 


0.093 


0.584 


0.846 


ly55 


b.61.1.1 


240 


229 


0.109 


0.633 


0.721 


lyij 


e.3.1.1 


263 


253 


0.079 


0.748 


0.668 


lzk4 


c.2.1.2 


251 


225 


0.084 


0.747 


0.8 


Izzk 


d.51.1.1 


80 


78 


0.103 


0.723 


0.701 


2bt9 


b.24.1.1 


266 


262 


0.164 


0.685 


0.786 


2bw4 


b.6.1.3 


334 


292 


0.103 


0.831 


0.901 


2cws 


b.29.1.18 


227 


219 


0.132 


0.696 


0.766 


2f01 


b.61.1.1 


241 


238 


0.151 


0.612 


0.74 


2fdn 


d.58.1.1 


55 


48 


0.146 


0.607 


0.702 


2pvb 


a.39.1.4 


107 


96 


0.063 


0.452 


0.557 


31zt 


d.2.1.2 


129 


126 


0.087 


0.415 


0.688 


7a3h 


c.1.8.3 


300 


295 


0.085 


0.675 


0.864 



